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1 . Introduction  and  Summary 

We  consider  in  this  paper  two  highly  structured  problems  of  optimal 
stochastic  control.  The  two  problems  will  be  precisely  formulated,  and 
optimal  control  policies  of  a simple  form  will  be  explicitly  computed,  in 
Sections  3 and  4.  In  this  section  we  present  an  informal  description  of 
each  problem  and  its  solution,  suppressing  all  technical  detail. 

Let  X = (X(t),  t > 0}  be  a Brownian  Motion  with  starting  state 

2 

x > 0,  drift  n and  variance  0 >0.  Thus  E[X( t) ] = x + and 

2 

V a r [ X ( t) ] = o t.  We  define  a control  to  be  a non-decreasing  process 

Y = (Y(t),  t > 0],  with  Y(0)  > 0,  which  is  a non-anticipating  functional 

(of  X).  Thus,  for  each  t > 0,  the  partial  control  history  (Y(u),  0 < u < t} 
may  depend  on  (X( u) , 0 < u < t]  and  possibly  on  other  information  as  well, 
but  it  may  not  depend  on  (X( t+u)  - X(t),  u > 0}.  (See  Section  2 for  a 
precise  definition.)  In  each  of  our  problems,  the  objective  is  to  find 
an  input  control  Y and  an  output  control  Z which  maximize  expected 
discounted  reward  (over  an  infinite  planning  horizon)  subject  to  the 
constraint  that  W(  t)  = X(  t)  + Y( t)  - Z( t)  >0  for  all  t > 0 (almost 
surely).  It  is  the  hypothesized  structure  of  costs  and  rewards  that 
differs  in  the  two  problems. 
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Given  a constant  k > 1,  the  objective  in  our  first  problem  is  to 
find  an  admissible  control  policy  jY,Z)  which  maximizes 

« JO 

at  at 

E[Z(0;  + f e'  L dZ(t )]  - kE[  Y^  0 + / e*  dY<  t ‘ ] 

0 0 

where  Ci  > 0 is  the  interest  rate.  By  way  of  interpretat ion,  we  imagine 
a storage  system  ^such  as  an  inventory  or  a bank  account)  whose  content 
evolves  as  the  Brownian  Motion  X in  the  absence  of  any  control  In 
particular,  X(0)  represents  the  initial  content  of  the  system.  The 
controller  may  at  any  time  withdraw  material  from  the  system,  and  Z) t; 
represents  the  total  withdrawal  during  the  interval  [0, t],  or  cumulative 
output  up  to  time  t.  He  receives  a reward  of  one  dollar  for  each  unit 
of  material  withdrawn.  If  the  content  of  the  system  falls  to  zero,  however, 
then  the  controller  is  obliged  to  inject  material  into  the  system  so  as 
to  keep  the  net  content  positive,  and  he  incurs  a cost  of  k > 1 dollars 
for  each  unit  of  material  injected  We  interpret  Y t as  the  total 
injection  during  the  interval  (0, tj , or  the  cumulative  input  up  to  time 
t We  call  W = X-rY-Z  the  control  led  process  and  X the  uncontrol  led 
process 

In  Section  3 it  will  be  shown  that  an  optimal  policy  for  this 
first  problem  is  the  minimal  pair  of  controls  (Y,Z;  which  achieves 
0 < Wit)  S for  all  t > 0 i almost  surely;  , where  S is  the  unique 
positive  solution  of  a certain  transcendental  equation.  Thus  the 
controller  should  withdraw  only  as  much  material  as  is  required  to  keep 
the  net  content  below  S,  and  he  should  inject  the  minimum  amount 
necessary  to  keep  the  net  content  positive.  The  optimal  controls  are 
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system  W( t) , We  also  discuss  the  relationship  between  our  model(s) 
and  other  (approximate)  diffusion  formulations  that  have  been  suggested 
for  such  problems.  In  Section  6 we  discuss  the  difference  between  our 
formulation  and  various  other  theories  of  optimal  stochastic  control. 


2.  Preliminaries 

Let  JR  = (-oo,  oo),  B+  = [0,  oo),  I = {0,  1,  ...}  and  I+  = (1,2,.. 

2 

Throughout  the  paper,  let  CL  > 0,  p.  £ 1R,  a > 0,  k > 1 and  K > 0 be 
fixed  constants.  We  assume  a measurable  space  (ft, 3)  on  which  is 
defined  a family  of  probability  measures  (P^,  x £ R)  and  a process 

X = (X(t),  t > 0}  such  that  X is  Brownian  Motion  with  drift  n, 

2 

variance  a > 0,  and  starting  state  x with  respect  to  P ( x £ R) . 

x 

We  denote  by  E the  expectation  operator  associated  with  P . The 

X X 

following  proposition  follows  easily  from  standard  properties  of 
Brownian  Motion. 


Proposition  1 supfe  X(  t) : t > 0 j < *>,  x £ JR. 


We  further  assume  the  existence  of  an  increasing  family  of  sub-a- 
algebras  (3t,  t > 0)  such  that  X is  adapted  to  (g;^]  and 

X( t+u)  - X( t)  is  independent  (with  respect  to  P for  all  x £ B) 

of  3t  for  all  t > 0 and  u > 0,  We  say  that  a random  variable  T 
is  a stopping  time  if  < T < »}  = I for  all  x £ IR  and 

(T  < t]  £ 3 for  all  t > 0,  Using  the  separability  of  X and  our 


4 


assumptions  on  (3t),  one  easily  obtains  the  following  by  a standard  type 


of  argument. 


Proposition  2.  If  T is  a stopping  time  and  Z is  measurable  3^,  then. 


for  all  x £IR. 


Px(Z  + X(T+t)  < ylSj)  = PZ+X(T)^X('I:^  - t > 0,  y elR  . 


Proposition  3.  If  T is  a stopping  time,  then 


QE  / e‘at  X(t)  dt  + E e_aT  X(T)  = x + p[l  - A(x)]/p  , x £ ® , 

0 


where  A(x)  = E e , x £1R. 


Proof:  Let  f(x)  = QE  / e‘  C X(t)dt,  x £K.  Then 

X 0 


f(x)  = a / e'  E X(t)  = a / e'at(x+ut)  = x + p/a 
0 x 0 


by  Fubini’s  Theorem.  Using  this  and  Proposition  2,  we  have 


T T 

x + p/a  - E / e~at  X(t)dt  = f(x)  - E / e'at  X(t)dt 


E*  / e*  X(t)  dt  = Ex(e'OT  Ex[/  e‘at  X(T+t)dt | g^] } 


Ex  e f(X(T))  = Ex  e fX( T)  + p/a ] , 


which  completes  the  proof. 
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If  f Is  a real-valued  function  on  some  interval  subset  of  IR, 

12. 

we  define  Jtf(x)  = pf*(xj  + — a f"(x)  for  all  x such  that  f * ( x ) and 
f "( x exist. 


Proposition  L.  Let  f : IR  -»  TR  be  non- decreasing  and  twice  continuously 

differentiable  with  - Ctf  < 0,  Assume  x £IR  and  let  and  T be 

+■ 

stopping  times  with  P {T^  < T ) = 1,  Finally^  let  Z be  measurable  3^  , 


I'  Ex  e'OT  f(Z  * XiT*))  <■  e*0^  f(Z  + X(Tj)  , 


if  both  expectations  exist  and  are  finite. 


Proof:  Assume  first  that  f and  its  first  two  derivatives  are  bounded, 

We  begin  by  proving  a slight  generalization  of  the  discounted  form  of 
Dynkin's  identity,  c,f,  Breiman  ( I968) , equation  (l6  6l)  Let 
glx,  - M x ; - Off)  x)  for  x £1R,  Then  a standard  result  for  Markov 
processes  ; in  this  case  the  Brownian  Motion  X with  generator  J>)  and 
their  resolvents  gives  us 


- (it 

f(  x/  - Ex  / e’  6(X(  t)  ;dt  , 


x £IR 


cf  Brenrnn  1 1 >68)  , Theorem  15,5  1-  Since  Z is  measurable  gtj,  , 
can  combine  : 2i  with  Proposition  2 to  obtain 


E / e'at  g(  Z + X(t))dt  = E e"0®*  / e_at  g(Z  + X(T^t))dt 

T*  0 

= E fe'0®*  E [/  e'at  g(Z  + X(T  +t))dt|3L  ]) 
x x 0 


- E E 

- x 6 Z+X(T  ) 


. \ / e‘  g(X(t))dt 
*'  0 


= Ex  e‘OT*  f(Z  + X(TJ) 


But  Z is  measurable  re  also,  so  an  identical  argument  gives 

* 3 
T 


ryf-  ryT  * 

(4)  E J e'at  g(Z  + X( t) ) = E e"  f ( Z + X(T  )) 


Subtracting  (3)  from  (4)  gives 


Ex[e‘OT  f(Z  - X(T*))  - e"OT*  f ( Z + X(T*))] 


= E / e'^L  g(Z  + X(t))  dt 


The  right  side  is  non- positive,  since  g(x)  < 0 for  all  x £ 2R}  so  the 
proposition  is  proved. 

If  either  f or  one  of  its  derivatives  is  unbounded,  we  can  easily 
construct  a sequence  of  bounded  functions  f having  two  bounded  con- 
tinuous derivatives  and  such  that  fn(x)  = f(x)  if  |f(x)l  ^ n and 
|f  | < jf|,  n f I+.  (The  construction  is  particularly  easy  with  our 
assumption  that  f is  monotone.)  We  have  shown  that  (1)  holds  with  f^ 
in  place  of  f,  and  obviously  f ->  f,  so  (1)  holds  by  dominated  con- 
vergence if  both  expectations  exist  and  are  finite.  This  completes  the 


proo  f . 
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We  define  a nun -anticipating  f unc t iona 1 to  be  a process 

Z - (_Z;  t ' t t -2  0 : on  C i'2, 3f)  taking  values  in  D[0,co)  and  such  that 

Z(  t j is  measurable  3^  for  each  t > 0,  We  define  C to  be  the  set 

of  non- ant ic i pat i ng  functionals  Z which  are  non-decreasing  with 

Z 0 , _>  0,  and  we  define  C to  be  the  set  of  Z fc  C such  that 

E R Z ‘ t x 1R  , where 

x ’ 


R'  Z)  ..  a f e'aL  Z{  t)dc  f Z £ C . 

0 

Then  Pie  Z t <•  0 as  t -»<*}  = 1 for  all  Z £ C . and  path- 
x x’ 

wise  ^ R lemsnn- St le 1 t jes)  integration  by  parts  gives 


("yt* 

P . R 2 ) 2;  o')  + [ e dZ(t)  } = 1 

x 0 


for  all  Z £ C 

x 


We  define  £ to  be  the  set  of  ( random)  step  functions  in  C having 

only  finitely  many  jumps  in  any  finite  interval^  and  we  take  to 

be  the  Set  of  Z . r,  0 C such  that  E R ( Z)  < ■*.  jx  o IR)  where 

x x ' ' ’ 


R i Z ) a 1 


(Z(0)  >0J 


p-QTn<  Z) 


n-  1 


where  Z.  i 5 the  n^1  jump  time  of  Z, 


Pro pos i t 


S 1 1 p ft 


-at 


Z{  t) 


t > oi 


if 


IR,  Z C 


— 
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Proof : Since  Z is  non-decreas ing^  we  have 


J a e Qu  Z(u)du  > / a e Qu  Z(  t)du  = e Z(  t) 


Taking  the  sup  of  each  side  over  all  t > 0 the  left  side  becomes 
Rf  Z ) > and  the  proposition  follows  from  the  fact  that  R(Z)  < «, 

Proposition  fo.  Suppose  f : 1R+  ->  IR  is  non-decreasing  with  f(x)  < a-t-bx 

for  some  b > 0 that  x > 0.  and  that  Y £ C and  Z £ r satisfy 
’ — ’ x x 


(5)  Px(X( t)  + Y( t)  - Z( t)  > 0 for  all  t > 0}  = 1 . 


_rv  t- 

Then  Ex  sup(e  f(X('t)  + Y(t)  - Z(t)  : t > 0}  < 


Proof:  Since  f is  non-decreasing  and  Z(-)  > 0,  we  have 


e'at  f ( 0)  < e'at  f(X(t)  + Y ( t)  - Z(  t) ) < e‘at  f(X(t)  + Y(t)) 


< e [a  + bX(  t)  + bY(  t)  ] 


almost  surely  (with  respect  to  P ) for  all  t > 0.  The  desired  result 
then  follows  immediately  from  Propositions  1 and  5. 


Propos i t ion  7 • Let  f x,  Y and  Z be  as  in  Proposition  6.  If 

(T  , n I)  are  non-negative  and  P (T  —»■»}=  1 then 
‘ n’  x n 


E e n f ( Xi  T ) f Yf  T ) - Z(  T ) ) ->  0 
x n n'  ' n' ' 


as  n > oc  . 


'■*11  ■■ 


Then  for  n . I and  t £ [0,  t .)  we  define 

' n_i_  1 ' 


V (T  *t)  - 7 VV  ♦ Vl(t>  - 

L=  1 


Z (Vt)  . I Z (T  ) * Z (t)  , 
1=1 


"<VC>  « 


so  that  W*(  t)  ■=  X(  t)  + Y ( t)  - Z*(  t)  > 0 for  all  t > 0.  The  Initial 

time  interval  [O^T^j  is  a period  of  output  control  only.  There  is  an 

initial  output  of  size  Z*(0)  = Z^(0)  = [X(0)-S]+,  and  during  the 

* 

remainder  of  the  period  the  cumulative  output  Z = Z^  increases  in  the 
minimum  amounts  necessary  to  maintain  X-Z^  < S.  The  controlled  process 
WL  = X-Z1  has  state  space  (-<»,S]  and  W^(0)  = [X(0)  A S],  and  it  is 
known  to  behave  as  the  Brownian  Motion  X modified  by  an  upper  (instan- 
taneously) reflecting  barrier  at  S.  (We  shall  use  this  fact  later  with- 
out  further  comment.)  The  period  ends  when  W = W^  hits  zero.  Each 

subsequent  interval  of  the  form  (T0  , T_  ,]  with  n £ I+  is  similarly 

cn'  cn+  i 

* 

a period  of  output  control  only.  The  controlled  process  W starts  in 

* 

state  S,  the  cumulative  input  Y remains  constant,  and  the  cumulative 
output  Z increases  in  the  minimum  amounts  necessary  to  maintain 


Each  interval  of  the  form  ^gn+P  T2n+2^  *8  3 Perio<*  inPut 

* 

control  only.  The  controlled  process  W starts  in  state  zero  and 
behaves  as  the*  Brownian  Motion  X modified  by  a lower  reflecting  barrier 

at  zero.  The  cumulative  output  Z remains  constant,  and  the  cumulative 

' « * 
input  Y increases  in  the  minimum  amounts  necessary  to  maintain  W > 0. 


The  period  ends  when  W hits  S.  In  W has  state  space  lO^SJ 

and  behaves  as  X modified  by  reflecting  barriers  at  both  boundaries. 

■ft  . 

Observe  that  (Y  Z ) is  an  x-admi ssible  policy  for  every 
starting  state  x > 0.  We  wish  now  lo  calculate 


f(x)  = ExlR(Y  ) - kR(  Z )J  , 


x > 0 . 


As  a first  step,  it  is  immediate  from  the  construction  that 


f(x)  = (x-S)  + f(S) 


for  x > S 


Assuming  now  that  X(0)  £ [0,S],  we  define 


- Ex[a  / e'  Z]_(t)dt  + e"  , 


0 < X < s 


A(x)  E e 

V 7 


0 < x <■'  S . 


Remembering  that  Y ' t)  - 0 if  0 c U < it  follows  easily  from  our 

construction,  the  strong  Markov  property  of  X,  and  the  definition  of 
R i • ) that 


f'x)  f iX)  i QE  j e at  [Z  ( t)  - Z ( T ) - kY  ( t)  ]dt 

L X T 1 


fix)  . OE  e'  rri  f(W*  ( T ) ) 

LX  l 


t : x)  ( UAl  x)  f(  0)  f 


Ocx  < S 


1> 


Now  to  solve  for  we  recall  that  W^( t)  = X( t)  - Z^(t)  for  0 < t < T 

and  X(T1)  = since  W (T  ) = 0.  Thus 


(10)  f1(x)=QEx/  e'  C[X(t)  - W1(t)]dt  + Ex  X(T1) 


for  0 < x < S.  Defining  H( x)  = E / e W ( t)dt  for  0 < x < 

x 0 

Proposition  3 and  ( 10)  give  us 


f^(x)  = x + nfl  - A(x)  )/a  - QH(x)  f 0 < x < S . 


Recall  that  = inf(t  > 0 : W^( t)  = 0]  and  that  behaves  as  the 

Brownian  Motion  X with  an  upper  reflecting  barrier  at  S.  Then  standard 
results  for  the  first  passage  times  and  potentials  of  Markov  processes 
(in  our  case  W^)  show  that  A and  H satisfy  the  differential  equations 


A A(  x)  - aA(  x)  = 0 and  x + ^ x)  - QH(  x)  = 0,  0 < x < S f 


with  the  boundary  conditions 


(13)  A( 0)  = 1,  H(0)  = 0,  and  A'(S)  = H'(S)  = 0 


Combining  (9)  - (13)  we  find  that  f satisfies 


Jf(x)  - af(x)  =0,  0 < x < S,  and  f'(S)  = 1 


Furthermore,  there  is  a precisely  symmetric  argument,  (defining  a new 
sequence  of  stopping  times  such  that  the  initial  period  [0,  T^j 

is  one  of  input  control  only  when  0 < X(0)  < S)  to  show  that  the  second 
boundary  condition  is 


f'(0)  = k . 


Using  standard  methods,  the  unique  solution  of  ( 14)  and  ( 1^)  is  found  to 


f(x)  = Oe^*  - be'^*  , 0 < x < S , 


where  the  constants  a and  b are  chosen  to  satisfy  the  boundary 
conditions  f'(0)  = k and  f'(S)  - 1.  Elementary  computations  then 


a = (e^S  - ke'rS)/(r-g)  (>rS  - e rS)  , 


b - ( kerS  - e^)/(/+P)  ( e^S  - e rS) 


Proposition  15.  The  function  f : IR  ->  R is  concave,  increasing  and 
twice  continuously  differentiable  with  f(S)  - n/a.  Furthermore, 

JtK  x)  - Uf(x)  <_  0,  !<_  f*(x)  _^£l<  and  f(x)  < (pi/ct  - S)  + x for  all 


x > 0. 
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Proof:  From  (3)  and  ( lit.)  it  is  immediate  that  f'  exists  and  is  continuous 

on  IR+.  For  0 < x < S we  differentiate  ; 16)  twice  to  obtain 

(19)  f”(x)  = a(r-£)2  e(r'^X  - b(r+£)2  e~(r+&)x  . 

Setting  this  expression  equal  to  zero , multiplying  through  by  exp(£x), 
and  substituting  ( 17)  and  ( 18)  for  a and  b,  we  find  that  f"(x)  = 0 
if  and  only  if  (7)  holds  with  x in  place  of  S,  Thus  f"(S)  = 0 with 
S chosen  to  satisfy  (7)*  Clearly  f"(x)  = 0 for  all  x>S  by  (8),  so 
f"  exists  and  is  continuous.  Since  |p[  < y it  is  clear  from  ( 18)  that 

b > 0.  Since  f"(S)  = 0,  it  then  follows  from  ( I9)  that  a > 0 and 

hence  that  f"  is  strictly  increasing  on  [0,S].  Thus  f"(x)  < 0 for 
0 < x < S,  and  it  follows  from  (8)  that  f is  strictly  increasing  and 
concave  on  Since  f '(0)  = k and  f '(x)  = 1 for  x > S,  this 

implies  that  1 < f '(x)  < k for  all  x > 0.  With  f ' ( S ) = 1 and 

f"(S)  = 0,  we  have  ^f  (S)  = p.  Since  Jft  - Otl  = 0 on  [0,S]  this 

gives  us  f(S)  = p/ar,  Then  (8)  yields  J>£  (x)  = p for  x > S,  implying 

irfifx)  - af(x)  = P - a[p/a  + (x-S)]  = -a(x-S) 


for 

so  Jrf  - Q£f  < 0 on  B+. 

Finally,  f i x ) 

--  ( p/a  - S)  + x 

for 

x > S 

by  (10),  so  the  concavity 

of  f gives 

f(x)  < (p/a  - s)  + x 

for 

x > 0. 

Remark.  As  the  proof  shows,  f'  is  continuous  regardless  of  how  one 
chooses  S,  but  f"  is  continuous  if  and  only  if  S is  chosen  to 
sat i s fy  (7  l . 
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Theorem  1.  If  x > 0 and  (Y?Z)  is  an  x-admissible  policy^  then 

■¥: 

E R(  Z)  - kE  R(Y)  <.  f(  x)  . Thus  the  policy  (Y  f Z ) constructed  above 
is  x-optimal  for  all  x > 0. 


Proof : Let  x > 0 and  (Y^Z)  be  fixed.  When  we  speak  of  almost  sure 

convergence^  this  refers  to  P . The  idea  of  the  proof  is  essentially 
to  approximate  Y and  Z by  (random)  step  functions.  Given  e > 0} 
we  define  an  increasing  sequence  of  stopping  times  T^  (hoping  the 
reader  will  forgive  this  re-use  of  previous  notation)  by  setting  T^  s 0 


T . = inf  { t > T : Y(t)  >Y(T  ) + t or  Z(  t)  > Z(T  ) + e or  t = T +e  j 
n+1  — n ' ' — n'  v ' — n'  n 


for  n £ I.  That  each  T^  is  a stopping  time  follows  immediately  from 
the  fact  that  Y and  Z are  nonanticipating  functionals.  Furthermore> 
0 < T^  < Tg  < •••  because  Y and  Z are  right  continuous.  Finally^ 

T^  ->  * almost  surely  as  n ->  *,  because  both  R(Y)  and  R(  Z)  are 
almost  surely  finite.  Let 


yo  = Y(°)  + ^ zo  = Z(0)>  Sd  = zo  ' % ’ 


Ao  qo  + f(x(0';  + yo  * z0>  ' f^°))  > 


and  then  for  n u I 


Vi 


Y 

n 


D 

n 


Y(Tn+i)  - Y(Tb). 


i=0 


y.  = Y(T  ) + e , 
i n7  ’ 


Y - Zn  , 

n n 5 


z = Z(T  ) - Z(T  ) , 
n+1  n+17  n7  ’ 


Z . V 


i-0 


!.  = Z(  T ) 

1 n7  ’ 


q Z - ky 
n n n ’ 


-OT 

Vi " • " <Vi  * f«Vi>  ♦ D„*i>  - f<x<Tlwi>  *■>»)]. 

-or  -aT 

B„  * * “+  «X<Vl>  ♦»»)-«  " f<X<V  - V • 


With  these  definitions,  we  have 

n n- 1 n -CflC.  -OT 

(20)  £ A + Z B.  = F,  e 1 q + e f(X(T  ) + Dj  - f(X(0))  . 

i=0  i=0  i=0 

From  our  construction  and  the  fact  that  (Y,Z)  is  x-admissible  it 

follows  that  X(T  ) + D > e and  X(T  ,)  + D >0  almost  surely  for 
n n — v n+17  n — 

all  n £ I.  Since  1 < f'(x)  < k.  it  is  then  immediate  that  A <0 

almost  surely  for  all  n £ I-  Furthermore,  we  can  use  Proposition  k to 

show  Bn  < 0 by  making  the  following  associations.  Let  f be  as 

defined  above  for  x > 0,  and  define  f(x)  by  ( 16)  for  x < 0.  Then 

Proposition  8 shows  that  f satisfies  the  hypotheses  of  Proposition  h. 

Let  T*  = T . , T = T and  Z = D = Y(  T ) - Z(  T ) + e . so  Z is 
n+17  * n n n7  n7  ’ 

measurable  3L  , Then  Proposition  L gives  E B jc  0 it  following 
L*  x n 

immediately  from  Propositions  6 and  8 that  both  expectations  exist  and 
are  finite.  Thus,  taking  the  expectation  of  both  sides  in  (20),  we 
have 


t 

J 


1 

i 


* 

). 


J 


I 
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J 


1 


n -uT.  -«T 

(21)  E I e 1 q ^ f(x)  - E e n f(X(T  ) + D ) . 

i--0 


Noting  again  that  Y(T  ) - Z( T^)  + t,  it  follows  from  Propositions  7 

and  3 that  the  second  term  on  the  right  side  of  (21)  goes  to  zero  as 


n <0  giving  us 


E > e 
X i,0 


q.  < f(x) 


Let  Y^l  t)  = Y and  Zrr(  t)  = Z^  for  Tn  < t < ^ and  n t I.  From 

# # 

our  construction  it  follows  that  (Y  , Z ")  is  an  x-admissible  policy 


Y(  t)  < Y#(  t)  < Y(  t)  + t and  Z(  t)  - e < Z#(  t)  < Z(  t) 


for  all  t > 0,  from  which  we  have 


' 2zj)  R ( Z ) - kR(Y)  - (l+k)e<  R(Z^)  - kR(Y^) 


■jo  -QfT . 

7.  e 1 q.  _ R(Z)  - kR(  Y) 
iO  1 


Thus  i Rf  Z)  - kR(  Y)  ] < f(x)  t-  ( Irk)  e by  (22)  and  (23).  Since  e >0 
was  chosen  arbitrarily,  the  proof  is  complete. 


4.  The  Control  Problem  with  One  Fixed  Charge 

For  our  second  problem,  we  define  an  x- admi ss i b le  policy  (x  > 0) 
to  be  a pair  of  controls  Y £ and  Z £ such  that  (5)  holds.  We 
say  that  an  x-admissible  policy  (Y  Z ) is  x-optimal  (x  > 0)  if, 
for  every  other  x-admissible  policy  (Y,Z), 

Ex[R(Z*)  - kR(Y*)  - KR*(Y*)}  > Ex[R(Z)  - kR(Y)  - KR*(Y) ] . 

As  in  the  previous  section,  let  S be  the  unique  positive  solution 
of  (7),  and  let  a > 0 and  b > 0 be  defined  in  terms  of  S by  (17) 
and  (18)  respectively.  (Recall  that  we  showed  a > 0 in  the  proof  of 
Proposition  8.)  Now  let  s be  the  unique  positive  solution  of 

(24)  a[ 1 - e"(r"P)S]  + b[e^r+^^  - 1]  = K + ks  . 

Elementary  calculations  show  that  the  left  side  of  (24)  is  strictly 
convex  and  increasing  with  value  zero  at  s - 0,  and  that  its  derivative 
increases  without  bound  as  s increases.  Thus  there  is  iq  fact  a 
unique  solution  s. 

With  X(0)  > 0,  we  set  = T^  5 0 and  then  define  Z^(t), 

W^( t)  and  in  terms  of  X and  the  positive  constant  ( s+S) 


exactly  as  they  were  defined  in  terms  of  X and  the  positive  constant 
S in  the  previous  section.  Then  for  n = 2,  3,  •••  let 


X { t)  - Si  XT  , - t - X I 
n ' ' n-  1 11-  : 


Z ( t)  - [sup{X  ' u)  : 0 u L - ■ • S ; l > 0 

n ' ' n ' — — ' — ’ 


W ( t)  --  X ( t)  - Z , l) 
n ' n'  ' n ’ 


l > 0 , 


r = inf { L > 0 : W : tj  0'  and  T T , + 
n — n n n- 1 n 


Again  it  is  easy  to  show  that 


P fO  T s.  T < T 
x 0—1  t 


far  all  x > 0 f 


and  for  n I and  t v.  [0,  i j.)  we  define 


Y (t)  , ns,  Z (t)  - ' Z.(t.)  + Z (t)  , 

i 1 


and  W ( t)  = Wn  ^(t),  so  that  W ( t)  X!t)  + Y ( t)  - Z (t)  for  all 

* 

t > 0.  The  behavior  of  the  controlled  process  W during  the 

initial  interval  [0^  T^)  is  as  described  in  Section  y except  that 

now  the  upper  reflecting  barrier  is  at  (s+S).  Each  subsequent  time 

interval  [T  . T ,)  begins  with  a jump  of  s in  the  cumulative  input 
n ’ nt-1 

Y this  moving  the  controlled  process  W from  state  zero  to  state  s. 

During  the  remainder  of  the  period ^ Y remains  constant  and  the  cumulative 
-* 

output  Z increases  in  the  minimum  amounts  necessary  to  maintain 

* •* 

W < (s+S).  The  period  ends  when  W hits  zero  and  jumps  upward  by  s 


again. 


Using  arguments  very  much  like  those  employed  in  the  previous 
section,  it  is  straight-forward  to  show  that  the  function 


g(x)  = E [R(Z*)  - kR(Y*)  - KR*(Y*)],  * > 0, 


satisfies  Jrg(x)  - OL g(x)  = 0 for  0 < x < s+S^  with  g'(s+S)  = 1, 
g(0)  = g( s)  - K - ks,  and  g(x)  = g( s+S)  + (x-S-s)  for  x > s+S. 
Again  the  unique  solution  has  the  form 


(25)  «<«)  . «0  - b0 


for  0 < x < s+S 


where  the  constants  aQ  and  are  selected  to  satisfy  the  boundary 

conditions  g'(s+S)  = 1 and  g( 0)  = g(  s)  - K - ks.  The  reader  may 
easily  verify  that,  with  s chosen  to  satisfy  (24),  the  selection 


aQ  = a e‘^r'e-S  and  b0  = b e^+^S 


meets  the  boundary  condition  g(0)  = g( s)  - K - ks.  We  then  further 
have  (defining  f by  ( 16) ) 


(27)  g(  s+x)  - a eir'^X  - b e‘^r+^x  = f(x)  for  0 < x < S. 


In  Section  3 we  showed  that  f'(S)  = 1,  so  our  second  boundary  condition 
g'(s+S)  = 1 is  also  satisfied,  and  the  complete  solution  for  g is 
given  by  (25),  (26)  and  g( s+S+x)  = g( s+S)  + x for  x > 0. 


Proposition  • , The  function  g : iRT  >K  is  concave,  increasing  and 
twice  continuously  differentiable  with  g(  s t-S  ) =_  ,.,/C t'.  Furthermore 
Jig(  x)  - ->g(x)  v.  0,  g ' ( x)  > 1 and  g(x)  jo  (p/C£  - s - S)  + x for  all 
x > 0.  Finally,  g x+y)  - g(x)  K + ky  for  all  x > 0 and  y > 0. 

Proof : Clearly  a - 0 and  > 0,  and  different iat ing  { &j>)  twice 

gives 

(26)  g"(x)  = aQ(y-3)2  - bQ(y-i-p)~  e‘('r+^X,  0 < x < stS  , 

so  g"  is  strictly  increasing  on  [0,  s+Sj.  We  showed  in  the  proof 
of  Proposition  6 that  f(S)  - u /Qf  f'(S)  - 1 and  f"(S)  - 0,  so  (27/ 
gives  us  g(s+S)  ■■  p/a,  g'(s+S)  L and  g"(  s+S)  = 0.  It  then  follows 
as  in  the  proof  of  Proposition  6 that  g is  concave,  increasing  and 
twice  differentiable  with  Jrf  - at  ’ 0,  Since  g'(x)  1 for 

x > s-fS,  the  concavity  implies  g'(x)  > 1 for  all  x > 0.  Also,  we 
showed  in  Section  ^ that  i 6 0)  k,  so  (2/)  gives  g'(s)  . k.  From 
tile  concavity  of  g we  then  have 

1 2))  giv)  - g.  0)  - ky  < g(  s)  - g(  O')  - ks  K for  all  y > 0 . 

Finally,  the  concavity  of  g implies  that  gyx+y)  - g(x)  is  a non- 
increasing function  of  x for  all  y > 0.  Combining  this  with  ( 2o) 
proves  the  last  statement  of  the  proposition. 


■a  *U»W'y»  iWiMil  . 


Theorem  2.  If  x > 0 and  ^ Y ^ Z ) is  an  x- admi ssib le  policy,  then 
E^)R(Z)  - kR(Y)  - KR  (Y)  ] < f(x).  Thus  the  policy  ;‘Y  , Z ) constructed 
above  is  x-optimal  for  all  x > 0. 


Pr oof : The  proof  is  very  similar  to  that  of  Proposition  P and  we  shall 

only  sketch  it.  We  define  the  sequence  of  stopping  times  T (again 
apologizing  for  the  notation)  by  T^  = 0 and 


T , ^ inf^L  > Tn  : Y(0  >Y(TJ  0r  Z(t)  2 Z(T  ) + £ 


or  t ^ T 4 e} 
n 


for  n t I . We  proceed  exactly  as  in  the  proof  of  Proposition  8 
except  that  f is  replaced  by  g throughout,  we  take  y^  = Y(0), 
and  we  define 


q - z -ky  - K 1 , 

hi  n ’ n I y > 0 ) ’ 


To  show  that  A ' q almost  surely  with  this  change  in  the  definition  of 

n — 

q , we  use  the  fact  that  g'  > 1 and  g( x+y)  - g(  x)  _ K i ky  by 

Proposition  1 , That  E B <0  for  all  n follows  exactly  as  before. 

x n — 

Defining  Y t)  Y and  Z^(  t)  Z for  T ■ t T and  n,  I 

n ' ' n n — n,  1 ’ 

we  arrive  at 


E Ri  Z')  - kR < Y"  ) - KR  (ZJ,f)]  = E T.  e'UT>  a - f(x)  . 

X X i-  0 ‘‘ 


But  Y i t)  : Y(  l)  and  Z(  t) 


V ( t)  Zi  t ) j or  ill  t > 0,  so  the 


desired  result  follows  directly. 


2k 


A p p l i c a t i on  s 

Consider  an  inventory  and  pi  ,jdu<.  t i or,  sy  a . tin  involv  ir,„  a single 
tvpe  o 1 item  pr  oJuc  t , and  assume  that  the  cumulative  excess  production 

0 i the  item  van  be  reasonably  represent  ed  bv  the  Brownian  M.  t i on 

X ;X.  t ' I > 0 . We  have  m mind  a si  tuat  ion  where  diet  it  a non- 
dec  reusing  cumuJat  i ve  pr.  due  t i _>n  process  P t P t . , t O ur.ci  a n.  u- 

decr  easing  cumulat  f ve  demand  plot  ess  D ( U i • , t 0 • such  t • at  P-1) 

can  be  approximated  by  X.  We  interpret  P as  the  cruder • ion  iron 
regular  operations  and  assume  that  additional  instantaneous  increases 
in  the  stock  level  can  be  acconip  11  oiied  by  some  irregular  means  such 
as  overtime  production  or  ordering  irom  an  outside  vendor  at  a tost  ol 
k > 0 dollars  per  item.  We  interpret  D as  the  demand  from  regular 
customers  and  assume  that  unlimited  additional  quantities  ol  excess  1 
stock  can  be  sold  by  irregular  means  such  as  sale  to  a scavenger  as 

scrap;  at  a price  of  c dollars  per  item,  where  0 <•  c k Let 

Y f Y'  t ) , t 0 i and  X t v 0 • denote  the  cumulative 

irregular  product i on  and  cumulative  irregular  sales  respectively. 

A s» tming  that  all  regular  demand  must  be  met  instantaneously  no 
backlogging  the  controls  Y and  X must  be  cbesen  so  that  the  stock 

level  W t X ; . Y t ; - L t is  u- m - nega t i ve  for  all  t > 0. 

assuming,  t hat  in.intorv  holding  costs  are  continuously  incurred  at 
!i  W t and  tint  future  costs  and  revenues  are  cent  i auously 

1 i :g  uinieJ  a i interest  rite  i • 0.  we  wish  to  choose  Y and  Z so 


, t ) maximize 


A 


Ei cR(  z i - kR( Y)  - J e'Ut  h(Wft))dtj  . 


If  h x hx  for  all  x > 0,  then  the  last  term  inside  the  brackets 
is  just 


.■  rvf 

h ' e [X  t;  - Y!  t)  - Z(  t)  Idt  = (h/a)  lR(Y)  - R(Z)  ] + h / e X(t)dt. 
0 0 


Observing  that  the  last  term  is  uncontrollable  (depends  on  neither  Y 
nor  Z)  we  then  see  that  the  total  objective  is  to  maximize  (c  + h/Ct)  ER(Z) 
- k -r  h/ Ct)  ER;  Y)  . Assuming  without  loss  of  generality  that  c + h/a  = 1, 
this  is  precisely  the  problem  formulated  in  Section  3.  If  the  irregular 
production  process  Z is  a step  function  and  if  a set-up  cost  of  K > 0 
is  incurred  each  time  that  a jump  in  the  irregular  production  occurs, 
then  we  similarly  obtain  the  problem  formulated  in  Section  4. 

A closely  related  diffusion  model  of  optimal  inventory  control 
has  been  advanced  by  Bather  ( I966) . In  our  notation,  Bather  assumes 
..  ' 0 and  K > 0,  and  he  considers  a linear  holding  cost  function  h( ■ ) . 

There  is  no  piovision  for  sale  of  excess  inventory  in  his  model 
Z; • ! - O'.  The  stock  level  is  permitted  to  go  negative  (hacklogging 
is  permitted;,  but  linear  shortage  costs  are  incurred  when  this  happens. 
Attention  is  restricted  to  a simple  class  of  input  step  functions  Y 
which  jump  by  a fixed  amount  i’S-s)  whenever  the  inventory  on  hand 
decreases  to  a fixed  level  s,  the  objective  being  to  minimize  average 
cost  per  unit  time  over  an  infinite  planning  horizon.  Other  diffusion 
formulations  for  problems  of  optimal  control  of  dams,  inventories  and 
storage  systems  are  given  by  Bather  ( hVoti),  Faddy  ( 1974a, b),  Whitt  ( 1973a  b) 


Put  or man  ; 1 •"  ) , ami  Pi i ska  . 1 • In  al ! ui  those  papers,  attention 

is  test  rioted  t o •,  iu>n-  i amiomi  zed  and  Markov  1 stationary  policies,  and 
in  all  but  the  last  there  is  a further  restriction  to  stall  nary  policies 
having  a particular  structure. 

As  a second  application  we  consider  t he  stochastic  cash  management 
problem,  disc  ret  e- t inie  versions  .>1  which  have  been  studied  by  Eppen  and 
Fama  1 ■••/•)  Girgis  ; lie.  },  and  Neave  1 .<,'0*  , Imagine  a film  which 
maintains  a cash  land  into  which  a certain  amount  oi  income  or  revenue 
is  automatically  channeled  and  out  of  which  operating  disbursements  are 
made.  We  assume  that  the  resulting  fluctuations  in  the  content  of  the 
fund  can  be  adequately  represented  by  the  Brownian  Motion  X.  Additional 
instantaneous  increases  in  the  content  of  the  fund  can  be  accomplished 
by  converting  securities  into  ‘.ash,  but  there  is  a transaction  cost  of 
k > 0 dollars  incurred  for  each  dollar  of  securities  so  converted. 

Also,  cash  from  the  fund  can  be  converted  into  securities  at  a trans- 
action cost:  of  c > 0 dollars  per  dollar  so  converted.  Finally  an 
opportunity  loss  of  h ^ 0 dollars  per  unit  time  is  suffered  for  each 
dollar  that  is  held  within  the  fund.  Denoting  by  Y(  t".  the  cumulative 
conversion  of  securities  to  cash  up  to  time  t,  and  by  Z\  t)  the  cumulative 
conversion  cash  to  securities,  we  assume  (hat  the  content  of  the  fund 
must  be  kept  non- negative.  Then  the  problem  is  to  find  an  admissible 
policy  Y,Z)  that  maximizes  •;  h/u:  - c)  Eli  Z)  - ( h/a  - k)  ER(  Y ) where 
the  linear  opportunity  cost  ( holding  cost)  lias  been  converted  as  in  the 
previous  example.  If  c . h/n  and  c k then  we  may  assume  without 
loss  of  generality  that  h / / - c 1,  and  we  have  precisely  the  problem 


formulated  in  Section  . 


If  the  conversion  ol  securities  to  cash  entails 


both  a fixed  charge  K > 0 and  the  proportional  charge  k,  then  we  get 
the  problem  formuLated  in  Section  4 

Constantinides  ( I976)  has  examined  both  of  these  cash  management 
problems  with  the  objective  of  minimizing  average  cost  (rather  than 
expected  discounted  cost)  over  an  infinite  planning  horizon,  and  he  has 
further  considered  the  case  where  both  types  of  conversion  entail  both 
fixed  and  proportional  transaction  costs.  (See  Section  6.)  His  results 
are  very  similar  to  ours,  but  the  methodology  employed  is  quite  different, 
and  we  do  not  fully  understand  his  proofs. 

For  both  inventory  control  problems  and  stochastic  cash  manage- 
ment problems,  one  finds  that  the  diffusion  formulations  discussed 
above  are  much  more  tractable  than  more  traditional  (usually  discrete 
review)  models.  For  our  particular  problems,  we  have  shown  that  the 
optimal  policy  ( from  a very  broad  class  of  potential  policies)  has  an 
extremely  simple  structure,  and  the  assumption  of  an  underlying  Brownian 
Motion  further  permits  explicit  determination  of  the  relevant  critical 
numbers.  With  traditional  formulations,  even  the  structural  results  may 
fail  and  the  computation  of  optimal  policies  is  typically  a complicated 
matter.  See  Girgis  ( I968)  for  a demonstration  of  this  in  the  case  of 
cash  management. 

On  the  other  side  of  the  issue,  it  simply  may  not  be  reasonable 
to  represent  the  underlying  (net  demand  or  net  production)  process  by 
a Brownian  motion.  Bather  ( I966)  suggested  that  a non-decreasing  demand 


process  be  approximated  by  Brownian  Motion  with  positive  drift,  but  as 
Whitt  (1973a)  has  pointed  out,  this  is  not  a circumstance  where  one 
expects  a good  approximation.  In  each  of  our  applications,  we  have 


28 


emphasized  problems  where  the  underlying  process  X represents  the 
difference  of  two  non-decreasing  processes,  and  in  this  circumstance 
various  theorems  on  the  (weak)  convergence  of  stochastic  processes  may 
be  invoked  (with  further  assumptions  on  the  parameters  of  the  relevant 
processes)  to  justify  the  Brownian  approximation.  See,  for  example, 
Harrison  ( 1975) • Even  when  this  can  be  done,  there  remains  the  problem 
of  justifying  one's  diffusion  optimization  problem  as  a reasonable 
approximation  to  the  original  optimization  problem.  If  one  restricts 
attention  to  policies  which  are  explicit  functionals  (and  continuous  in 
the  appropriate  function  space  topology)  of  the  underlying  process,  we 
believe  that  this  might  be  a manageable  task  in  the  rather  restrictive 
setting  of  our  model,  but  the  issue  will  not  be  pursued  further  here. 


■y. 


6 . Concluding  Remarks 


If  for  each  t > 0 we  define  to  be  the  sub-c-algebra 


generated  by  (X( u) , 0 < u < t),  then  it  follows  from  the  stationary, 
independent  increments  property  of  X that  X( t + u)  - X(  t)  is  independent 


of  for  all  t > 0 and  u > 0.  Thus  we  may  take  & in  our 


formulation  (Section  2),  and  in  general  we  must  have  ^ cz  g^  for  all 

t > 0.  If  a control  policy  (Y,Z)  has  the  property  that  Y( t)  and 

Z( t)  are  measurable  A for  all  t > 0 we  shall  call  it  a non- randomized 

t — y 1 ■ 1 — ■ ■— — 

policy.  Given  a non- randomized  policy  (Y,Z),  let  y be  the  a-algebra 
in  generated  by  X(  t)  + Y(  t)  - Z(  t)  for  t > 0.  We  shall  say  that 

(Y,Z)  is  a Markov  policy  if  Y(t-iu)  and  Z(  t-iu)  are  measurable  y 


• -r  • 

r 


for  all  t > 0 and  u > 0 Finally,  we  shall  say  chat  a Markov  policy 
(Y,Z)  is  stationary  if  the  conditional  distribution  of  'Y'  t+u)  - Y('t) 

Z( t + uj  - Z( t) ; u > 0]  given  ^ depends  only  on  X;  t ) + Y(  t)  • Z(  t) 
and  not  on  t ( t > 0;  All  of  this  terminology  conforms  with  the 
standard  usage  in  (discrete  time)  dynamic  programming,  appropriately 
adapted  to  our  setting  Roughly  speaking,  a policy  Y,Z ) is  non- 
randomized  if  the  controls  applied  up  to  time  t depend  only  on  the 
history  of  the  underlying  process  X up  to  time  t and  not  on  any 
other  "irrelevant"  information  contained  in  3^.  It  is  Markov  if  the 
controls  to  be  applied  after  time  t depend  on  the  history  up  to  time 
t only  through  the  current  "state  of  the  system"  W( t)  = X(  t)  + Y( t)  - Z(t), 
and  it  is  stationary  if  this  dependence  on  W;  t)  does  not  involve  t. 

As  the  reader  may  easily  verify,  the  policies  that  we  have  shown  to  be 
optimal  for  our  two  problems  are  both  ( non- randomi zed  and  Markov) 
stationary  policies 

A natural  successor  for  the  two  problems  considered  in  this 
paper  is  one  where  both  the  input  control  Y and  the  output  control 
Z must  be  step  functions,  and  there  are  different)  fixed  costs 
associated  with  both  input  jumps  and  output  jumps  Similar  (and  much 

more  complex;  problems  of  pure  impulse  (jump)  control  have  been  con- 
sidered bv  Bensoussan  and  Lions  ( 1973,  1975)  and  by  Richard  ( I976) 
but  the  method  of  proof  used  here  can  also  be  extended  to  the  case 
of  two  fixed  charges  As  we  shall  demonstrate  in  a future  paper,  there 
exists  an  optimal  solution  for  this  problem  which  involves  only  three 
critical  numbers,  but  the  computation  of  those  critical  numbers  is 
quite  complicated  Tf  we  allow  one  (respectively,  both)  of  the  fixed 
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charges  to  approach  zero,  we  find  that  the  optimal  controls  approach 
those  displayed  in  Section  k ( respectively,  Section  3)  almost  surely. 

Thus,  roughly  speaking,  each  of  the  problems  treated  here  can  be  obtained 
as  the  limit  of  problems  involving  two  fixed  charges. 

Our  problem  with  no  fixed  charges  (Section  3)  can  also  be  approxi- 
mated by  a formulation  of  the  type  considered  by  Mandl  ( I968)  and  Pliska 
(1973).  Suppose  that  the  non-decreasing  controls  Y and  Z are  both 

required  to  be  absolutely  continuous  and  non-decreasing  with  a density 

bounded  by  c > 0.  We  cannot  then  require  that  W( t)  = X(  t) +Y(  t) -Z( t) 
remain  positive,  but  we  suppose  that  a large  penalty  cost  of  M (dollars 
per  unit  time)  is  continuously  incurred  so  long  as  W(  t)  < 0.  It  can 
then  be  shown  that  there  are  critical  numbers  a and  b with  0 < a < b < °o 
such  that  one  optimal  policy  is  the  following.  When  W( t)  < a,  the  con- 
troller increases  Y at  the  maximum  permissible  rate  c,  when  W(  t)  > b 
he  increases  Z at  rate  c,  and  when  a < W(  t)  < b he  does  nothing. 

If  we  let  c -)  00,  we  find  that  a ->  0,  b -»  S and  the  optimal  controls 

converge  almost  surely  to  those  displayed  in  Section  3. 

Having  indicated  in  the  last  two  paragraphs  that  our  problems  can 
be  approximated  in  either  of  two  ways,  we  emphasize  that  either  type  of 
approximate  formulation  is  harder  to  solve  than  the  problems  as  we  have 
stated  them.  Also,  we  repeat  that  the  optimal  controls  displayed  in 
Sections  3 an^  ^ are  neither  absolutely  continuous  nor  step  functions. 

We  are  not  aware  of  any  previous  formulation  of  a stochastic  control 
problem  which  permits  controls  of  the  type  that  we  have  found  to  be 
optimal. 
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